What is the type - 1 / type - 2 distinction ?

نویسنده

  • Nick Chater
چکیده

Clark & Thornton’s type-1/-2 distinction is not well-defined. The classes of type-1 and type-2 problems are too broad: many nocomputable functions are type-1 and type-2 learnable. They are also too narrow: trivial functions, such as identity, are neither type-1 nor type-2 learnable. Moreover, the scope of type-1 and type-2 problems appears to be equivalent. Overall, this distinction does not appear useful for machine learning or cognitive science. 1. Why probabilities? Clark & Thornton (C&T) frame the learning problem as deriving a conditional probability distribution P(YuX), where X and Y are sets of possible inputs and outputs, from a set of input-output pairs, (x, y). This is puzzling, because the learning systems that C&T consider (e.g., feedforward neural networks) produce a single output, given each input, rather than a conditional probability distribution over all possible outputs.1 Moreover, C&T state that if a pattern (x, y) has been encountered, then P(yux) 5 1 (sect. 2, para. 4), which indicates that they assume that the conditional probability distribution is degenerate – that is, for each input there is a single output. So they appear not be concerned with learning arbitrary conditional probability distributions, but rather with learning functions from input to output. 2. All conditional probability distributions are Type 1 learnable. C&T say a distribution to be learned “P(yux) 5 p might be [type-1] justified if . . . P(yux9) 5 p, where x9 is some selection of values from input-vector x . . .” (sect. 2, para. 4). Suppose x9 is the selection of all values of x – that is, x9 5 x. Then it trivially follows that P(yux) 5 p if and only if P(yux9) 5 p. That is, all conditional probability distributions, including as a special case all functions (including the uncomputable functions), are type-1 learnable. Note that adding the stipulation that x9 cannot include all of x does not help. This can be circumvented by adding irrelevant “dummy” values to each input vector (e.g., a string of 0s) – the learning problem is now just as hard as before. Then the selection x9 does not take all elements of the input; it ignores the dummy values. But as before P(yux) 5 p if and only if P(yux9) 5 p. 3. The problem of novel outputs. From the above, it seems that C&T’s definition does not capture their intuitive notion successfully. From the examples they give, it seems that they intend that P(yux9) is not an arbitrary probability distribution, but rather that it is estimated from frequencies in the input data by F(y, x9)/ F(x9), where F(x9) is the number of occurrences of patterns which match x on the selection x9 of values, and F(y, x9) is the number of such patterns associated with output y. But this definition is also inadequate in general, because it means that any novel output ynovel must be assigned probability 0, because F(ynovel, x9) 5 0, precisely because ynovel has not occurred in the training set. This means that the class of type-1 problems is very restrictive. It does not include the identity function (in which each input is mapped to a different and hence novel output). C&T face a dilemma. If they follow their stated definition, then all conditional probability distributions are type-1 learnable. If they follow the frequency-based analysis they use in the text, then no conditional probability distribution which assigns a nonzero probability to any unseen output is Type 1 learnable, which seems drastically restrictive. Furthermore, the frequency-based approach also faces the problem that probabilities can never be estimated exactly from a finite amount of data, and therefore that the F(y, x9)/F(x9) will not in general equal P(yux9) 5 p. The best that such an estimate can be is probably approximately correct, in some sense (e.g., Valiant 1984). 4. What does type-2 learning mean? C&T say a distribution to be learned “P(yux) 5 p might be [Type 2] justified if . . . P[yug ([ X) 5 z] 5 p, where g is some arbitrary function, [ X is any seen input, and z is the value of function g applied to x.” (sect. 2, para. 4). This formulation is difficult to interpret, because it uses notation in an unconventional way. But from the later discussion, the appropriate interpretation appears to be this: the function g maps some subset S of previously seen inputs onto a common output, z. We estimate the conditional probability (presumably that which C&T call “P[yug ([ X) 5 z] 5 p”) by the number of members of S which produce output y, divided by the total number of members of S. As with type-1 problems, this means that the conditional probaCommentary/Clark & Thornton: Trading spaces BEHAVIORAL AND BRAIN SCIENCES (1997) 20:1 69 bility of all novel outputs must be zero for a problem to be type-2 learnable, for the same reason: the frequency count for novel outputs is necessarily 0. So the identity function is not type-2 learnable either. But worse, all the nonnovel outputs can be justifiably predicted with probability 1. Suppose that a previous input, xprev was paired with the output yprev. Then define g such that g(x) 5 z (where x is the novel input), and g(xprev) 5 z; g(xrest) 5 z 1 1, for all other previously seen inputs xrest. g is a “recoding” of the inputs that classifies the novel input x with a single past input xprev. The subset, S, defined above, has one member, which produced output yprev, so that the estimated conditional probability is 1/1 5 1. Hence, the arbitrary output yprev is justifiably predicted with probability 1. An analogous argument extends not just to a single novel x, but to all possible novel x. In short, any function whatever which generalizes from the seen instances to the unseen instances is type-2 learnable, even the noncomputable ones (so long as there are no novel outputs). Note that type-2 problems appear to have the same (rather bizarre) scope as type-1 problems. They are both too broad and too narrow in the same way. NOTE 1. The output of neural networks can be viewed as a probability distribution over possible outputs if, for example, outputs are binary and intermediate values are interpreted as probabilities (e.g., Richard & Lippman 1991). A different approach assumes that outputs are distorted (for example by Gaussian noise). This is useful in understanding learning in Bayesian terms (Mackay 1992). Moreover, some networks implicitly produce conditional probability distributions by generating a distribution of outputs over time (e.g., the Boltzmann machine; Hinton & Sejnowski 1986). None of these approaches seems relevant to C&T’s discussion. Commentary/Clark & Thornton: Trading spaces 70 BEHAVIORAL AND BRAIN SCIENCES (1997) 20:1 happens to separate the unseen (x1 5 1, x2 5 1) input from the three seen patterns – so learning the XOR function – it is remote and never occurred in my simulations. In no way do these results reflect the unconditional probability P(y 5 0) 5 0.33 – although back-propagation can be used to estimate such probabilities with an appropriate cost function replacing the squared error measure (Baum & Wilczek 1988). The OR solution was found very fast: typically in 55–65 iterations with learning rate 0.5, momentum 0.9, and error criterion of 0.03 (based on an average allowable error of 0.1 per pattern). 3. “Extended” parity problem. Although parity itself – involving binary-valued variables – is not a generalisation problem, it is straightforward to extend the situation depicted in Figure 1 to a bona fide generalisation problem. Suppose our training data-set is:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toriyeh: the Way of Escaping from Telling Lies to Patients

Toriyeh means concealing real intention of speech using its parallel and common words so that the listener constructs from speaker's speech a meaning what he/she meant. The purpose of this research is studying jurisprudential dimensions of toriyeh in order to clarify its distinction from lying and related jurisprudential commandments by explanation of the most important discussions about toriye...

متن کامل

تفکیک دولومیتها براساس بافت در سنگ های کربناته ژوراسیک میانی در شمال شرق مشهد

The Mozduran One Formation (Middle Jura~!;ic ) in NE Ma!;hhad is mainly composed o f dolomite and limestone. Based on texture (size and fabri c), dolomites were divided into 5 diITerent types with a crystal size which ranges from 10 to 1200 microns. Type one is very fine tn fine subhedral to euhedral crystals. while type two is fine to medium euhedral with poikilotopic texture. Type three...

متن کامل

Depression, Anxiety, Psychosomatic Symptoms and Perceived Social Support in Type D and non Type D Individuals

Objectives: This study examines depression, anxiety, psychosomatic indications and perceived social support among type d and non type individuals. Methods: Altogether, 300 individuals of age from 18-40 years took an interest in the investigation. Distress personality scale (DS 14), Pakistan anxiety and despondency scale, psychosomatic and perceived social support scales were utilized. The part...

متن کامل

Cognitive Evolution of the “Human” Concept and Its Adaptation to Piaget’s Theory

Background: Cognitions and attitudes, especially anthropological attitudes, are influential in human behavior. Objectives: The present study was conducted to investigate the cognitive evolution of the human concept in elementary school female students and its adaptation to Piaget’s theory of cognitive development. Materials & Methods: The present research method is qualitative of deductive-ind...

متن کامل

Cultural Capital as a Bulwark against Social Alienation (Case Study: Students of Saqqez City)

The purpose of this study is to evaluate the relationship between cultural capital and social alienation. According to Pierre Bourdieu’s Standpoint, cultural capital involves habitus, rhetoric styles, different forms of knowledge and tastes. According to Bourdieu, the individuals who are in high economic and social hierarchical level have different patterns, habitus and type of cultural goods...

متن کامل

Amphibole Chemistry in Low–temperature I–type Granites from Kashmar Area, Northeastern Central Iran Plate (CIP)

The Kashmar granitoid (43.5–42.4 Ma) forms an extensive part of iron–oxide type magmatic belt in northern side of the Doruneh Fault. It includes plutons of tonalite, granodiorite, granite and alkali feldspar granite which are metaluminous (ASI ≤ 1) in nature. They contain dominantly felsic minerals, representing I–type granites with low–temperature and low–pressure characteristics. According to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998